Skip to content

feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools#3081

Merged
Yuan325 merged 17 commits intogoogleapis:mainfrom
huangjiahua:feat/cloud-storage-source
Apr 22, 2026
Merged

feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools#3081
Yuan325 merged 17 commits intogoogleapis:mainfrom
huangjiahua:feat/cloud-storage-source

Conversation

@huangjiahua
Copy link
Copy Markdown
Contributor

@huangjiahua huangjiahua commented Apr 16, 2026

Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox, enabling LLM agents to work with objects across buckets in a GCP project. The source is project-scoped and authenticates via Application Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the approved design (14 total):

  • cloud-storage-list-objects — prefix filter, delimiter-based grouping (returns prefixes), and pagination via max_results / page_token. Passes through whatever metadata the GCS client returns (*storage.ObjectAttrs) so we don't have to plumb new fields later.
  • cloud-storage-read-object — reads an object's bytes, textual data only, with optional HTTP-style byte ranges (bytes=0-999, bytes=-500, bytes=500-).

GCS-aware error categorization (per DEVELOPER.md) is implemented in a new cloudstoragecommon helper that maps GCS sentinels and *googleapi.Error codes to Agent errors (missing bucket/object, bad request, unsatisfiable range) vs. Server errors (auth, IAM denial, quota, 5xx, context cancellation). This replaces the coarse util.ProcessGcpError for the two new tools.

Remaining 12 tools from the design doc (list_buckets, create_bucket, copy/move/delete_object, etc.) will land in follow-up PRs.

CI note: the cloud-storage shard in .ci/integration.cloudbuild.yaml expects CLOUD_STORAGE_PROJECT=$PROJECT_ID and requires the test service account to have a Cloud Storage admin role in the test project. Integration test self-manages its own UUID-suffixed bucket with defer-based cleanup.

PR Checklist

  • Make sure you reviewed CONTRIBUTING.md
  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea (communicated internally)
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
  • Make sure to add ! if this involve a breaking change

What's included

  • New source: internal/sources/cloudstorage/ (+ YAML-parse unit tests)
  • Two tools: internal/tools/cloudstorage/cloudstoragelistobjects/, .../cloudstoragereadobject/ (+ YAML-parse + range-parser unit tests)
  • New cloudstoragecommon error classifier (+ 17-case unit test covering sentinels, HTTP statuses, context.Canceled/DeadlineExceeded, and fallback)
  • Integration test: tests/cloudstorage/cloud_storage_integration_test.go — 12 sub-tests against a real bucket (self-created, self-cleaned)
  • Docs: docs/en/integrations/cloud-storage/ (source + both tool pages; passes .ci/lint-docs-{source,tool}-page.sh)
  • CI shard: cloud-storage in .ci/integration.cloudbuild.yaml
  • Dependency: cloud.google.com/go/storage v1.62.1

Opening as draft for initial review — happy to split the error-classifier refactor into a separate commit if reviewers prefer.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds Google Cloud Storage integration, introducing a new source and tools for listing and reading objects. The implementation includes configuration, error handling, and tests. Feedback recommends capping listing page sizes at 1000 for consistency, implementing memory safety limits when reading objects, and updating documentation titles to include the 'Tool' suffix.

Comment thread internal/sources/cloudstorage/cloudstorage.go Outdated
Comment thread internal/sources/cloudstorage/cloudstorage.go
@huangjiahua huangjiahua marked this pull request as ready for review April 16, 2026 23:28
@huangjiahua huangjiahua requested a review from a team as a code owner April 16, 2026 23:28
…s and read_object tools

Adds a new project-scoped `cloud-storage` source using ADC, plus two read-only
tools: `cloud-storage-list-objects` (with prefix/delimiter/pagination) and
`cloud-storage-read-object` (with HTTP-style byte range and base64 payload).

Introduces a GCS-aware error classifier in `cloudstoragecommon` that splits
failures into Agent errors (missing bucket/object, bad request, unsatisfiable
range) and Server errors (auth, IAM denial, quota, 5xx, cancellation) per
DEVELOPER.md, replacing the coarse-grained `util.ProcessGcpError`.

Ships YAML-parse unit tests, an error-classifier unit test, a range-parser unit
test, a live-GCS integration test (12 sub-tests, UUID-suffixed bucket with
self-cleanup), docs under `docs/en/integrations/cloud-storage/`, and a
`cloud-storage` CI shard.

The remaining 12 tools from the approved design doc land in follow-up PRs.
…dObject at 1 MiB

- ListObjects: pageSize() now clamps to the GCS API max of 1000 so callers that
  pass a larger max_results don't pre-allocate oversized buffers.
- ReadObject: reject objects/ranges over 1 MiB with the new sentinel
  cloudstoragecommon.ErrReadSizeLimitExceeded, which the classifier maps to an
  Agent error so the LLM can retry with a narrower 'range'.
- Docs + integration tests updated (two new sub-tests: oversize rejection and
  oversize-narrowed-by-range success).
… MiB

8 MiB gives agents more headroom for typical text/JSON/log payloads while
still guarding against OOM. Doc and the oversize integration seed updated to
match.
…ckage

DefaultMaxReadBytes doesn't belong in errors.go — the limit is a source-side
invariant, not an error-classification concern. The sentinel
ErrReadSizeLimitExceeded stays in cloudstoragecommon because the classifier
still needs to recognize it.
…geSize bounds

Cleanup loop in the integration test was treating any iterator error as
iterator.Done; now distinguishes the two and logs non-Done errors so
flaky teardowns are debuggable. Also adds an internal unit test for
pageSize covering 0, negative, in-range, and over-cap inputs.
MCP tool results only carry text today, so the previous base64-encoded
content was unusable by the LLM. Validate object bytes with utf8.Valid
and return plain-text content; non-UTF-8 objects surface as an
agent-fixable ErrBinaryContent error. TODO notes mark the spots to
revisit once MCP supports embedded resources.
@huangjiahua huangjiahua force-pushed the feat/cloud-storage-source branch from 91a222a to 4919821 Compare April 17, 2026 19:30
Copy link
Copy Markdown
Contributor

@Yuan325 Yuan325 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @huangjiahua Thank you for the contribution! Please let me know if you need any clarifications

Comment thread internal/sources/cloudstorage/cloudstorage.go Outdated
Comment thread internal/sources/cloudstorage/pagesize_internal_test.go Outdated
Comment thread internal/sources/cloudstorage/cloudstorage.go Outdated
Comment thread internal/sources/cloudstorage/cloudstorage.go
Comment thread internal/sources/cloudstorage/cloudstorage.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go
The storage.Client is an implementation detail; external callers that
need it use the StorageClient() accessor, so the field itself doesn't
need to be exported.
…e tests into single test file per package

Merge TestPageSize into cloudstorage_test.go and TestParseRange into
cloudstoragereadobject_test.go. Both test files now use the internal
package so they can exercise the unexported pageSize and parseRange
helpers directly, removing the need for separate *_internal_test.go
files.
…with AgentError

Previously, values above the GCS per-page cap of 1000 were silently
clamped by the pageSize helper, which could confuse agents when the
returned page was smaller than requested. Validate max_results during
Invoke and return an AgentError so the limit is explicit. Docs and the
parameter description are updated to match; the pageSize clamp remains
as defense in depth. A unit test covers the rejection path and an
integration test exercises it over HTTP.
…age_token inputs

Add two integration sub-tests confirming that empty-string inputs are
accepted by the GCS client as expected: ListObjects with empty prefix
and delimiter returns an unfiltered listing, and an empty page_token
returns the first page rather than erroring. These cases address
review questions about whether the values passed through to
storage.Query and iterator.NewPager are safe when unset.
… simplify storage client init in integration test

Drop the initStorageClient wrapper and the option.WithUserAgent call;
the integration test now uses storage.NewClient(ctx) directly, matching
the suggestion in review and removing a needless indirection.
… table-drive integration tests and reuse RunToolGetTestByName

Replace the bespoke runCloudStorageToolGetTest with two
tests.RunToolGetTestByName calls that assert the full manifest for
each tool. Convert the list_objects and read_object sub-tests to
table-driven form: each case declares a request body plus substring,
content, or contentType expectations, driven by a single assertion
loop. The inherently two-step pagination test stays as its own
t.Run. Behaviour is unchanged; the file is ~220 lines shorter in
boilerplate.
… drop tool-get manifest test

Remove runCloudStorageToolGetTest entirely. The manifest-shape check
it performed was redundant: unit tests already cover ParseFromYaml for
each config, and the invoke sub-tests exercise the tool handlers over
HTTP. Keeping a full-manifest deep-equal here just duplicates that
coverage and has to be updated whenever parameter docs change.
The tool layer already rejects max_results above 1000, so the source-level
cap is redundant. Inline the minimal 0-to-default conversion needed by
iterator.NewPager (which errors on pageSize <= 0) and remove the pageSize
helper and its test.
@huangjiahua huangjiahua requested a review from Yuan325 April 22, 2026 17:02
@Yuan325 Yuan325 assigned Yuan325 and unassigned duwenxin99 Apr 22, 2026
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
Comment thread tests/cloudstorage/cloud_storage_integration_test.go Outdated
… assert tool-get manifests and rename to snake_case

Add RunToolGetTestByName coverage for both cloud-storage tools, matching
the looker integration pattern. Rename the test config identifiers
(`my_instance`, `my_list_objects`, `my_read_object`) to snake_case per
the tool-name convention.
@huangjiahua huangjiahua requested a review from Yuan325 April 22, 2026 21:15
@Yuan325
Copy link
Copy Markdown
Contributor

Yuan325 commented Apr 22, 2026

/gcbrun

@Yuan325
Copy link
Copy Markdown
Contributor

Yuan325 commented Apr 22, 2026

/gcbrun

@Yuan325 Yuan325 merged commit da27b37 into googleapis:main Apr 22, 2026
22 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
…t_objects and read_object tools (#3081)

## Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox,
enabling LLM agents to work with objects across buckets in a GCP
project. The source is project-scoped and authenticates via Application
Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the
approved design (14 total):

- **`cloud-storage-list-objects`** — prefix filter, delimiter-based
grouping (returns `prefixes`), and pagination via `max_results` /
`page_token`. Passes through whatever metadata the GCS client returns
(`*storage.ObjectAttrs`) so we don't have to plumb new fields later.
- **`cloud-storage-read-object`** — reads an object's bytes, textual
data only, with optional HTTP-style byte ranges (`bytes=0-999`,
`bytes=-500`, `bytes=500-`).

GCS-aware error categorization (per
[DEVELOPER.md](../blob/main/DEVELOPER.md#tool-invocation--error-handling))
is implemented in a new `cloudstoragecommon` helper that maps GCS
sentinels and `*googleapi.Error` codes to Agent errors (missing
bucket/object, bad request, unsatisfiable range) vs. Server errors
(auth, IAM denial, quota, 5xx, context cancellation). This replaces the
coarse `util.ProcessGcpError` for the two new tools.

Remaining 12 tools from the design doc (`list_buckets`, `create_bucket`,
`copy/move/delete_object`, etc.) will land in follow-up PRs.

**CI note:** the `cloud-storage` shard in
`.ci/integration.cloudbuild.yaml` expects
`CLOUD_STORAGE_PROJECT=$PROJECT_ID` and requires the test service
account to have a Cloud Storage admin role in the test project.
Integration test self-manages its own UUID-suffixed bucket with
defer-based cleanup.

## PR Checklist

- [x] Make sure you reviewed
[CONTRIBUTING.md](https://github.com/googleapis/mcp-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/mcp-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea (communicated internally)
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

## What's included

- New source: `internal/sources/cloudstorage/` (+ YAML-parse unit tests)
- Two tools: `internal/tools/cloudstorage/cloudstoragelistobjects/`,
`.../cloudstoragereadobject/` (+ YAML-parse + range-parser unit tests)
- New `cloudstoragecommon` error classifier (+ 17-case unit test
covering sentinels, HTTP statuses,
`context.Canceled`/`DeadlineExceeded`, and fallback)
- Integration test:
`tests/cloudstorage/cloud_storage_integration_test.go` — 12 sub-tests
against a real bucket (self-created, self-cleaned)
- Docs: `docs/en/integrations/cloud-storage/` (source + both tool pages;
passes `.ci/lint-docs-{source,tool}-page.sh`)
- CI shard: `cloud-storage` in `.ci/integration.cloudbuild.yaml`
- Dependency: `cloud.google.com/go/storage v1.62.1`

Opening as **draft** for initial review — happy to split the
error-classifier refactor into a separate commit if reviewers prefer. da27b37
github-actions Bot pushed a commit to Jaleel-zhu/genai-toolbox that referenced this pull request Apr 22, 2026
…t_objects and read_object tools (googleapis#3081)

## Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox,
enabling LLM agents to work with objects across buckets in a GCP
project. The source is project-scoped and authenticates via Application
Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the
approved design (14 total):

- **`cloud-storage-list-objects`** — prefix filter, delimiter-based
grouping (returns `prefixes`), and pagination via `max_results` /
`page_token`. Passes through whatever metadata the GCS client returns
(`*storage.ObjectAttrs`) so we don't have to plumb new fields later.
- **`cloud-storage-read-object`** — reads an object's bytes, textual
data only, with optional HTTP-style byte ranges (`bytes=0-999`,
`bytes=-500`, `bytes=500-`).

GCS-aware error categorization (per
[DEVELOPER.md](../blob/main/DEVELOPER.md#tool-invocation--error-handling))
is implemented in a new `cloudstoragecommon` helper that maps GCS
sentinels and `*googleapi.Error` codes to Agent errors (missing
bucket/object, bad request, unsatisfiable range) vs. Server errors
(auth, IAM denial, quota, 5xx, context cancellation). This replaces the
coarse `util.ProcessGcpError` for the two new tools.

Remaining 12 tools from the design doc (`list_buckets`, `create_bucket`,
`copy/move/delete_object`, etc.) will land in follow-up PRs.

**CI note:** the `cloud-storage` shard in
`.ci/integration.cloudbuild.yaml` expects
`CLOUD_STORAGE_PROJECT=$PROJECT_ID` and requires the test service
account to have a Cloud Storage admin role in the test project.
Integration test self-manages its own UUID-suffixed bucket with
defer-based cleanup.

## PR Checklist

- [x] Make sure you reviewed
[CONTRIBUTING.md](https://github.com/googleapis/mcp-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/mcp-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea (communicated internally)
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

## What's included

- New source: `internal/sources/cloudstorage/` (+ YAML-parse unit tests)
- Two tools: `internal/tools/cloudstorage/cloudstoragelistobjects/`,
`.../cloudstoragereadobject/` (+ YAML-parse + range-parser unit tests)
- New `cloudstoragecommon` error classifier (+ 17-case unit test
covering sentinels, HTTP statuses,
`context.Canceled`/`DeadlineExceeded`, and fallback)
- Integration test:
`tests/cloudstorage/cloud_storage_integration_test.go` — 12 sub-tests
against a real bucket (self-created, self-cleaned)
- Docs: `docs/en/integrations/cloud-storage/` (source + both tool pages;
passes `.ci/lint-docs-{source,tool}-page.sh`)
- CI shard: `cloud-storage` in `.ci/integration.cloudbuild.yaml`
- Dependency: `cloud.google.com/go/storage v1.62.1`

Opening as **draft** for initial review — happy to split the
error-classifier refactor into a separate commit if reviewers prefer. da27b37
github-actions Bot pushed a commit to renovate-bot/googleapis-_-genai-toolbox that referenced this pull request Apr 23, 2026
…t_objects and read_object tools (googleapis#3081)

## Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox,
enabling LLM agents to work with objects across buckets in a GCP
project. The source is project-scoped and authenticates via Application
Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the
approved design (14 total):

- **`cloud-storage-list-objects`** — prefix filter, delimiter-based
grouping (returns `prefixes`), and pagination via `max_results` /
`page_token`. Passes through whatever metadata the GCS client returns
(`*storage.ObjectAttrs`) so we don't have to plumb new fields later.
- **`cloud-storage-read-object`** — reads an object's bytes, textual
data only, with optional HTTP-style byte ranges (`bytes=0-999`,
`bytes=-500`, `bytes=500-`).

GCS-aware error categorization (per
[DEVELOPER.md](../blob/main/DEVELOPER.md#tool-invocation--error-handling))
is implemented in a new `cloudstoragecommon` helper that maps GCS
sentinels and `*googleapi.Error` codes to Agent errors (missing
bucket/object, bad request, unsatisfiable range) vs. Server errors
(auth, IAM denial, quota, 5xx, context cancellation). This replaces the
coarse `util.ProcessGcpError` for the two new tools.

Remaining 12 tools from the design doc (`list_buckets`, `create_bucket`,
`copy/move/delete_object`, etc.) will land in follow-up PRs.

**CI note:** the `cloud-storage` shard in
`.ci/integration.cloudbuild.yaml` expects
`CLOUD_STORAGE_PROJECT=$PROJECT_ID` and requires the test service
account to have a Cloud Storage admin role in the test project.
Integration test self-manages its own UUID-suffixed bucket with
defer-based cleanup.

## PR Checklist

- [x] Make sure you reviewed
[CONTRIBUTING.md](https://github.com/googleapis/mcp-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/mcp-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea (communicated internally)
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

## What's included

- New source: `internal/sources/cloudstorage/` (+ YAML-parse unit tests)
- Two tools: `internal/tools/cloudstorage/cloudstoragelistobjects/`,
`.../cloudstoragereadobject/` (+ YAML-parse + range-parser unit tests)
- New `cloudstoragecommon` error classifier (+ 17-case unit test
covering sentinels, HTTP statuses,
`context.Canceled`/`DeadlineExceeded`, and fallback)
- Integration test:
`tests/cloudstorage/cloud_storage_integration_test.go` — 12 sub-tests
against a real bucket (self-created, self-cleaned)
- Docs: `docs/en/integrations/cloud-storage/` (source + both tool pages;
passes `.ci/lint-docs-{source,tool}-page.sh`)
- CI shard: `cloud-storage` in `.ci/integration.cloudbuild.yaml`
- Dependency: `cloud.google.com/go/storage v1.62.1`

Opening as **draft** for initial review — happy to split the
error-classifier refactor into a separate commit if reviewers prefer. da27b37
github-actions Bot pushed a commit to pepe57/genai-toolbox that referenced this pull request Apr 23, 2026
…t_objects and read_object tools (googleapis#3081)

## Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox,
enabling LLM agents to work with objects across buckets in a GCP
project. The source is project-scoped and authenticates via Application
Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the
approved design (14 total):

- **`cloud-storage-list-objects`** — prefix filter, delimiter-based
grouping (returns `prefixes`), and pagination via `max_results` /
`page_token`. Passes through whatever metadata the GCS client returns
(`*storage.ObjectAttrs`) so we don't have to plumb new fields later.
- **`cloud-storage-read-object`** — reads an object's bytes, textual
data only, with optional HTTP-style byte ranges (`bytes=0-999`,
`bytes=-500`, `bytes=500-`).

GCS-aware error categorization (per
[DEVELOPER.md](../blob/main/DEVELOPER.md#tool-invocation--error-handling))
is implemented in a new `cloudstoragecommon` helper that maps GCS
sentinels and `*googleapi.Error` codes to Agent errors (missing
bucket/object, bad request, unsatisfiable range) vs. Server errors
(auth, IAM denial, quota, 5xx, context cancellation). This replaces the
coarse `util.ProcessGcpError` for the two new tools.

Remaining 12 tools from the design doc (`list_buckets`, `create_bucket`,
`copy/move/delete_object`, etc.) will land in follow-up PRs.

**CI note:** the `cloud-storage` shard in
`.ci/integration.cloudbuild.yaml` expects
`CLOUD_STORAGE_PROJECT=$PROJECT_ID` and requires the test service
account to have a Cloud Storage admin role in the test project.
Integration test self-manages its own UUID-suffixed bucket with
defer-based cleanup.

## PR Checklist

- [x] Make sure you reviewed
[CONTRIBUTING.md](https://github.com/googleapis/mcp-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/mcp-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea (communicated internally)
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

## What's included

- New source: `internal/sources/cloudstorage/` (+ YAML-parse unit tests)
- Two tools: `internal/tools/cloudstorage/cloudstoragelistobjects/`,
`.../cloudstoragereadobject/` (+ YAML-parse + range-parser unit tests)
- New `cloudstoragecommon` error classifier (+ 17-case unit test
covering sentinels, HTTP statuses,
`context.Canceled`/`DeadlineExceeded`, and fallback)
- Integration test:
`tests/cloudstorage/cloud_storage_integration_test.go` — 12 sub-tests
against a real bucket (self-created, self-cleaned)
- Docs: `docs/en/integrations/cloud-storage/` (source + both tool pages;
passes `.ci/lint-docs-{source,tool}-page.sh`)
- CI shard: `cloud-storage` in `.ci/integration.cloudbuild.yaml`
- Dependency: `cloud.google.com/go/storage v1.62.1`

Opening as **draft** for initial review — happy to split the
error-classifier refactor into a separate commit if reviewers prefer. da27b37
github-actions Bot pushed a commit to sumedhdk22/genai-toolbox that referenced this pull request Apr 23, 2026
…t_objects and read_object tools (googleapis#3081)

## Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox,
enabling LLM agents to work with objects across buckets in a GCP
project. The source is project-scoped and authenticates via Application
Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the
approved design (14 total):

- **`cloud-storage-list-objects`** — prefix filter, delimiter-based
grouping (returns `prefixes`), and pagination via `max_results` /
`page_token`. Passes through whatever metadata the GCS client returns
(`*storage.ObjectAttrs`) so we don't have to plumb new fields later.
- **`cloud-storage-read-object`** — reads an object's bytes, textual
data only, with optional HTTP-style byte ranges (`bytes=0-999`,
`bytes=-500`, `bytes=500-`).

GCS-aware error categorization (per
[DEVELOPER.md](../blob/main/DEVELOPER.md#tool-invocation--error-handling))
is implemented in a new `cloudstoragecommon` helper that maps GCS
sentinels and `*googleapi.Error` codes to Agent errors (missing
bucket/object, bad request, unsatisfiable range) vs. Server errors
(auth, IAM denial, quota, 5xx, context cancellation). This replaces the
coarse `util.ProcessGcpError` for the two new tools.

Remaining 12 tools from the design doc (`list_buckets`, `create_bucket`,
`copy/move/delete_object`, etc.) will land in follow-up PRs.

**CI note:** the `cloud-storage` shard in
`.ci/integration.cloudbuild.yaml` expects
`CLOUD_STORAGE_PROJECT=$PROJECT_ID` and requires the test service
account to have a Cloud Storage admin role in the test project.
Integration test self-manages its own UUID-suffixed bucket with
defer-based cleanup.

## PR Checklist

- [x] Make sure you reviewed
[CONTRIBUTING.md](https://github.com/googleapis/mcp-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/mcp-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea (communicated internally)
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

## What's included

- New source: `internal/sources/cloudstorage/` (+ YAML-parse unit tests)
- Two tools: `internal/tools/cloudstorage/cloudstoragelistobjects/`,
`.../cloudstoragereadobject/` (+ YAML-parse + range-parser unit tests)
- New `cloudstoragecommon` error classifier (+ 17-case unit test
covering sentinels, HTTP statuses,
`context.Canceled`/`DeadlineExceeded`, and fallback)
- Integration test:
`tests/cloudstorage/cloud_storage_integration_test.go` — 12 sub-tests
against a real bucket (self-created, self-cleaned)
- Docs: `docs/en/integrations/cloud-storage/` (source + both tool pages;
passes `.ci/lint-docs-{source,tool}-page.sh`)
- CI shard: `cloud-storage` in `.ci/integration.cloudbuild.yaml`
- Dependency: `cloud.google.com/go/storage v1.62.1`

Opening as **draft** for initial review — happy to split the
error-classifier refactor into a separate commit if reviewers prefer. da27b37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants